Skip to content

fix: capture DashScope multimodal media outputs#227

Open
sipercai wants to merge 1 commit into
mainfrom
fix/dashscope-multimodal-output-uri
Open

fix: capture DashScope multimodal media outputs#227
sipercai wants to merge 1 commit into
mainfrom
fix/dashscope-multimodal-output-uri

Conversation

@sipercai

Copy link
Copy Markdown
Collaborator

Description

This PR updates DashScope MultiModalConversation output parsing so media URLs returned in response content are captured as output URI parts. Image and video content items now become Uri parts in gen_ai.output.messages, matching the existing text/audio handling and allowing downstream multimodal processing to see generated media outputs.

Fixes # (N/A)

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

  • git diff --check
  • python -m pytest instrumentation-loongsuite/loongsuite-instrumentation-dashscope/tests/test_multimodal_conversation.py -q
  • python -m pytest instrumentation-loongsuite/loongsuite-instrumentation-dashscope/tests/test_image_synthesis.py -q
  • python -m ruff check instrumentation-loongsuite/loongsuite-instrumentation-dashscope
  • python "$PIPELINE_SKILL_DIR/scripts/check_loongsuite_pr_readiness.py" --repo .
  • tox -e precommit
  • tox -c tox-loongsuite.ini -e py313-test-loongsuite-instrumentation-dashscope-oldest,py313-test-loongsuite-instrumentation-dashscope-latest

Validation Evidence

Spec and Scope

  • Linked issue/spec: N/A; local bug report for DashScope MultiModalConversation image output capture.
  • Approved spec/comment: User requested direct implementation for response.output.choices[0].message.content = [{"image": "..."}].
  • Changed surface: loongsuite-instrumentation-dashscope output message extraction and tests.

Local Checks

Check Command Result Notes
Static readiness python "$PIPELINE_SKILL_DIR/scripts/check_loongsuite_pr_readiness.py" --repo . pass LoongSuite static readiness checks passed.
Precommit tox -e precommit pass Completed repository precommit gate.
Focused tests tox -c tox-loongsuite.ini -e py313-test-loongsuite-instrumentation-dashscope-oldest,py313-test-loongsuite-instrumentation-dashscope-latest pass Both environments reported 51 passed, 4 skipped.
Focused lint tox -c tox-loongsuite.ini -e lint-loongsuite-instrumentation-dashscope blocked Environment dependency download stalled; direct python -m ruff check instrumentation-loongsuite/loongsuite-instrumentation-dashscope passed.
Claude review Local Codex-Claude review loop pass No blocking findings remained after review/fix/re-review.
Privacy scan scan changed files for local paths, bearer tokens, API keys, and secret-looking keys pass No hits in changed files.

Real E2E Matrix

Scenario Status Command or Demo Evidence
non-streaming pass Live DashScope MultiModalConversation.call(model="wan2.7-image") smoke Real response returned image content and local span contained gen_ai.output.messages with modality=image URI.
streaming pass tox -c tox-loongsuite.ini -e py313-test-loongsuite-instrumentation-dashscope-oldest,py313-test-loongsuite-instrumentation-dashscope-latest Existing streaming multimodal tests passed in both focused environments.
concurrency blocked Not run for this narrow output-parser fix Run a bounded two-call DashScope smoke before marking ready if concurrency evidence is required.
agent/tool/ReAct N/A DashScope SDK media output parser has no agent/tool surface Not an agent framework integration.
tool-heavy N/A DashScope SDK media output parser has no tool-calling surface Not a tool orchestration integration.
error path pass Focused DashScope tox suites Existing error-handling tests passed in both focused environments.

Telemetry and Weaver

Check Status Command or Artifact Notes
Span tree / span kinds pass Live DashScope smoke plus ARMS readback ARMS readback confirmed a chat wan2.7-image LLM span with image URI in gen_ai.output.messages.
Content capture modes pass Focused unit test with SPAN_ONLY; existing no-content tests in DashScope suite New test asserts image URI is written when content capture is enabled; existing no-content tests continue to pass.
Concurrency isolation blocked Not run Run a bounded concurrent smoke before ready-for-review if required.
Weaver live-check blocked weaver registry live-check -r <loongsuite-semantic-conventions-registry> --advice-profile loongsuite-genai ... Not run in this draft preparation; ARMS readback was used for telemetry evidence.

CI

  • GitHub checks: pending PR creation.
  • Known unrelated failures: none identified.
  • Follow-up needed: rerun focused lint tox and Weaver live-check before moving this PR out of draft if maintainers require those gates.

Does This PR Require a Core Repo Change?

  • Yes. - Link to PR:
  • No.

Checklist:

See contributing.md for styleguide, changelog guidelines, and more.

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added
  • Documentation has been updated

@sipercai sipercai marked this pull request as ready for review June 23, 2026 06:39
@sipercai sipercai force-pushed the fix/dashscope-multimodal-output-uri branch from 265a89a to 8a17210 Compare June 23, 2026 07:16

@ralf0131 ralf0131 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

DashScope MultiModalConversation output parsing now captures image and video content items as Uri parts in gen_ai.output.messages, extending the existing text/audio handling to all media types returned by the API. The implementation correctly mirrors the input-side extraction pattern (text → image → audio → video) and the existing Uri construction used throughout the file.

Findings

No issues found. The new elif branches follow the established pattern exactly — same Uri(uri, modality, mime_type=None, type="uri") shape as the existing audio handling, consistent with both _extract_multimodal_input_messages and the image/video synthesis paths.

Test Coverage

  • Parametrized unit test covers image, audio, and video URI extraction (audio was previously untested at this level — nice bonus).
  • Mixed-content test (text + image) verifies part ordering is preserved.
  • End-to-end span attribute test validates the full pipeline from response parsing through to gen_ai.output.messages on the finished span.

Compatibility

Purely additive — new elif branches, no change to existing behavior or public API. Backward compatible.


Automated review by github-manager-bot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants